AITopics | variable name

Collaborating Authors

variable name

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Causal Discovery and Inference through Next-Token Prediction

Neural Information Processing SystemsJun-17-2026, 19:47:26 GMT

Deep neural networks have been criticized as fundamentally statistical systems that fail to capture causal structure and perform causal reasoning. Here we demonstrate that a GPT-style transformer trained for next-token prediction can simultaneously discover instances of linear Gaussian structural causal models (SCMs) and learn to answer counterfactual queries about those SCMs. First, we show that the network generalizes to counterfactual queries about SCMs for which it has seen interventional data but not any examples of counterfactual inference. The network must, thus, have successfully composed discovered causal structures with a learned counterfactual inference algorithm. Second, we decode the implicit "mental" SCM from the network's residual stream activations and manipulate it using gradient descent with predictable effects on the network's output. Our results suggest that statistical prediction may be sufficient to drive the emergence of internal causal models and causal inference capacities in deep neural networks.

artificial intelligence, machine learning, scm, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CARE: Turning LLMs Into Causal Reasoning Expert

Dong, Juncheng, Liu, Yiling, Aloui, Ahmed, Tarokh, Vahid, Carlson, David

arXiv.org Artificial IntelligenceNov-21-2025

Large language models (LLMs) have recently demonstrated impressive capabilities across a range of reasoning and generation tasks. However, research studies have shown that LLMs lack the ability to identify causal relationships, a fundamental cornerstone of human intelligence. We first conduct an exploratory investigation of LLMs' behavior when asked to perform a causal-discovery task and find that they mostly rely on the semantic meaning of variable names, ignoring the observation data. This is unsurprising, given that LLMs were never trained to process structural datasets. To first tackle this challenge, we prompt the LLMs with the outputs of established causal discovery algorithms designed for observational datasets. These algorithm outputs effectively serve as the sufficient statistics of the observation data. However, quite surprisingly, we find that prompting the LLMs with these sufficient statistics decreases the LLMs' performance in causal discovery. To address this current limitation, we propose CARE, a framework that enhances LLMs' causal-reasoning ability by teaching them to effectively utilize the outputs of established causal-discovery algorithms through supervised fine-tuning. Experimental results show that a finetuned Qwen2.5-1.5B model produced by CARE significantly outperforms both traditional causal-discovery algorithms and state-of-the-art LLMs with over a thousand times more parameters, demonstrating effective utilization of its own knowledge and the external algorithmic clues.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.16016

Country: Asia (0.15)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages Marie-Anne Lachaux

Neural Information Processing SystemsAug-15-2025, 10:32:50 GMT

In the original approach proposed by Devlin et al. [2018], a fraction of selected masked words is

arxiv preprint arxiv, identifier name, objective, (13 more...)

Neural Information Processing Systems

Country: Europe > France (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation

Beau, Nathanaël, Crabbé, Benoît

arXiv.org Artificial IntelligenceApr-10-2025

As text and code resources have expanded, large-scale pre-trained models have shown promising capabilities in code generation tasks, typically employing supervised fine-tuning with problem statement-program pairs. However, increasing model size and data volume for performance gains also raises computational demands and risks of overfitting. Addressing these challenges, we present RETROcode, a novel adaptation of the RETRO architecture \cite{RETRO} for sequence-to-sequence models, utilizing a large code database as an auxiliary scaling method. This approach, diverging from simply enlarging model and dataset sizes, allows RETROcode to leverage a vast code database for prediction, enhancing the model's efficiency by integrating extensive memory. Our findings indicate that RETROcode not only outperforms similar-sized traditional architectures on test sets but also approaches the effectiveness of the much larger Codex model, despite being trained from scratch on a substantially smaller dataset.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2504.05759

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Leveraging Large Language Models for Automated Causal Loop Diagram Generation: Enhancing System Dynamics Modeling through Curated Prompting Techniques

Liu, Ning-Yuan Georgia, Keith, David R.

arXiv.org Artificial IntelligenceMar-23-2025

T ransforming a dynamic hypothesis into a causal loop diagram (CLD) is crucial for System Dynamics Modelling. Extracting key variables and causal relationships from text to build a CLD is often challenging and time - consuming for novice modelers, limiting SD tool adoption. This paper introduces and tests a method for automating the translation of dynamic hypotheses into CLDs using large language models (LLMs) with curated prompting techniques. We first describe how LLMs work and how they can make the inferences needed to build CLDs using a standard digraph structure. Next, we develop a set of simple dynamic hypothe ses and corresponding CLDs from leading SD textbooks. We then compare the four different combinations of prompting technique s, evaluating their performance against CLD s labeled by expert modelers . Results show that for simple model structures and using curated prompting techniques, LLMs can generate CLDs of a similar quality to expert - built ones, accelerating CLD creation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.21798

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Measuring Similarity in Causal Graphs: A Framework for Semantic and Structural Analysis

Liu, Ning-Yuan Georgia, Yang, Flower, Jalali, Mohammad S.

arXiv.org Artificial IntelligenceMar-13-2025

Causal graphs are commonly used to understand and model complex systems. Researchers often construct these graphs from different perspectives, leading to significant variations for the same problem. Comparing causal graphs is, therefore, essential for evaluating assumptions, integrating insights, and resolving disagreements. The rise of AI tools has further amplified this need, as they are increasingly used to generate hypothesized causal graphs by synthesizing information from various sources such as prior research and community inputs, providing the potential for automating and scaling causal modeling for complex systems. Similar to humans, these tools also produce inconsistent results across platforms, versions, and iterations. Despite its importance, research on causal graph comparison remains scarce. Existing methods often focus solely on structural similarities, assuming identical variable names, and fail to capture nuanced semantic relationships, which is essential for causal graph comparison. We address these gaps by investigating methods for comparing causal graphs from both semantic and structural perspectives. First, we reviewed over 40 existing metrics and, based on predefined criteria, selected nine for evaluation from two threads of machine learning: four semantic similarity metrics and five learning graph kernels. We discuss the usability of these metrics in simple examples to illustrate their strengths and limitations. We then generated a synthetic dataset of 2,000 causal graphs using generative AI based on a reference diagram. Our findings reveal that each metric captures a different aspect of similarity, highlighting the need to use multiple metrics.

causal graph, graph, similarity, (17 more...)

arXiv.org Artificial Intelligence

2503.11046

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
(9 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Is Your Benchmark (Still) Useful? Dynamic Benchmarking for Code Language Models

Guan, Batu, Wu, Xiao, Yuan, Yuanyuan, Li, Shaohua

arXiv.org Artificial IntelligenceMar-9-2025

In this paper, we tackle a critical challenge in model evaluation: how to keep code benchmarks useful when models might have already seen them during training. We introduce a novel solution, dynamic benchmarking framework, to address this challenge. Given a code understanding or reasoning benchmark, our framework dynamically transforms each input, i.e., programs, with various semantic-preserving mutations to build a syntactically new while semantically identical benchmark. We evaluated ten popular language models on our dynamic benchmarks. Our evaluation reveals several interesting or surprising findings: (1) all models perform significantly worse than before, (2) the ranking between some models shifts dramatically, and (3) our dynamic benchmarks can resist against the data contamination problem.

benchmark, mutation, qwen2, (13 more...)

arXiv.org Artificial Intelligence

2503.06643

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.74)

Add feedback

CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation

Yan, Kaiwen, Guo, Hongcheng, Shi, Xuanqing, Xu, Jingyi, Gu, Yaonan, Li, Zhoujun

arXiv.org Artificial IntelligenceMar-5-2025

With the rapid advancement of Large Language Models (LLMs), the demand for robust instruction-following capabilities in code generation tasks has grown significantly. Code generation not only facilitates faster prototyping and automated testing, but also augments developer efficiency through improved maintainability and reusability of code. In this paper, we introduce CodeIF, the first benchmark specifically designed to assess the abilities of LLMs to adhere to task-oriented instructions within diverse code generation scenarios. CodeIF encompasses a broad range of tasks, including function synthesis, error debugging, algorithmic refactoring, and code explanation, thereby providing a comprehensive suite to evaluate model performance across varying complexity levels and programming domains. We conduct extensive experiments with LLMs, analyzing their strengths and limitations in meeting the demands of these tasks. The experimental results offer valuable insights into how well current models align with human instructions, as well as the extent to which they can generate consistent, maintainable, and contextually relevant code. Our findings not only underscore the critical role that instruction-following LLMs can play in modern software development, but also illuminate pathways for future research aimed at enhancing their adaptability, reliability, and overall effectiveness in automated code generation.

constraint, instruction, qwen2, (15 more...)

arXiv.org Artificial Intelligence

2502.19166

Country:

Europe > Austria > Vienna (0.14)
North America > Dominican Republic (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Rakotonirina, Nathanaël Carraz, Hamdy, Mohammed, Campos, Jon Ander, Weber, Lucas, Testoni, Alberto, Fadaee, Marzieh, Pezzelle, Sandro, Del Tredici, Marco

arXiv.org Artificial IntelligenceFeb-19-2025

Large Language Models (LLMs) are increasingly used in working environments for a wide range of tasks, excelling at solving individual problems in isolation. However, are they also able to effectively collaborate over long-term interactions? To investigate this, we introduce MemoryCode, a synthetic multi-session dataset designed to test LLMs' ability to track and execute simple coding instructions amid irrelevant information, simulating a realistic setting. While all the models we tested handle isolated instructions well, even the performance of state-of-the-art models like GPT-4o deteriorates when instructions are spread across sessions. Our analysis suggests this is due to their failure to retrieve and integrate information over long instruction chains. Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.

information, instruction, pablo, (14 more...)

arXiv.org Artificial Intelligence

2502.13791

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Next Steps in LLM-Supported Java Verification

Teuber, Samuel, Beckert, Bernhard

arXiv.org Artificial IntelligenceFeb-3-2025

Recent work has shown that Large Language Models (LLMs) are not only a suitable tool for code generation but also capable of generating annotation-based code specifications. Scaling these methodologies may allow us to deduce provable correctness guarantees for large-scale software systems. In comparison to other LLM tasks, the application field of deductive verification has the notable advantage of providing a rigorous toolset to check LLM-generated solutions. This short paper provides early results on how this rigorous toolset can be used to reliably elicit correct specification annotations from an unreliable LLM oracle.

annotation, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.01573

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
Europe > Greece (0.05)
North America > United States > California > Los Angeles County > Pasadena (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback